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Rapid Mixing in Markov Chains 
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Abstract 

A wide class of "counting" problems have been studied in Computer Sci- 
ence. Three typical examples are the estimation of - (i) the permanent of an 
n X n 0-1 matrix, (ii) the partition function of certain n— particle Statisti- 
cal Mechanics systems and (iii) the volume of an n— dimensional convex set. 
These problems can be reduced to sampling from the steady state distribution 
of implicitly defined Markov Chains with exponential (in n) number of states. 
The focus of this talk is the proof that such Markov Chains converge to the 
steady state fast (in time polynomial in n). 

A combinatorial quantity called conductance is used for this purpose. 
There are other techniques as well which we briefiy outline. We then illustrate 
on the three examples and briefly mention other examples. 
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1 . Examples 

We consider "counting problems" , where there is an implicitly defined finite set 
X and one wishes to compute exactly or approximately |X|. In many situations, 
the approximate counting problem can be reduced to the problem of generating 
uniformly at random an element of X (the random generation problem). This is 
often the relatively easier part. Then, the generation problem is solved by devising 
a Markov Chain with set of states X with uniform steady state probabilities and 
then showing that this chain "mixes rapidly" - i.e., is close to the steady state 
distribution after a number of steps which is bounded above by a polynomial in the 
length of the input. [The proof of rapid mixing is often the challenging part.] We will 
illustrate the problem settings and scope of the area by means of three examples in 
this section. Then we will outline some tools for proving rapid mixing and describe 
very briefly how the tools are applied in some examples. This paper presents a 
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cross-section of methods and results from the area. A more comprehensive survey 
can be found in |19| . 

Our first example is the permanent of a n x n matrix A. Valiant showed 
that the exact computation of the permanent is ^ P - hard, i.e., every problem 
in a class of problems called # P is reducible to the exact computation of the 
permanent of a matrix; thus it is conjectured that it is not solvable in polynomial 
time. The hardness result holds even for the case with each entry a or a 1 whence 
the problem is to find |X| where X = {a & Sn ■ = IVi}. Note that X here 

is implicitly defined by A. In the general case, we may think of A as specifying a 
weight Yii ^i,<j{i) on each a in X. 

As usual, we measure running time as a function of n, a natural parameter of 
the problem (like the n above) and 1/e, where e > is the relative error allowed. 
Our primary aim is a polynomial (in n, 1/e) time bounded algorithm; but, we will 
also discuss methods which help improve the polynomial. A recent breakthrough 
due to Jerrum, Sinclair, Vigoda (22, gives an approximation algorithm with such a 
time bound for the permanent (of a matrix with non-negative entries) settling this 
important open problem. 

Our second example starts with the classical problem of computing the volume 
of a compact convex set in Euclidean n— space R". Dyer, Frieze and the author [T7| 
gave polynomial time algorithm for estimating the volume to any specified relative 
error e. They first reduce the problem to that of drawing a random point from the 
convex set (with uniform probability density). They then impose a grid on space 
and do a "coordinate random walk" - from current grid point x in K , pick one of 
the 2n coordinate neighbours y of x at random and go to ?/ if y G K; otherwise, stay 
at X. Under mild conditions, it is easy to show that the steady state distribution 
is uniform (over the grid points in the set); they show that in a polynomial (in n) 
number of steps, we are "close" to the steady state. [The number of states of the 
chain can be exponentially large.] 

Lovasz and Simonovits have devised a continuous state space random walk 
called the "ball walk" which performs better. In this, we choose at the outset a 
"step size" i5 > 0. From the current point x, we pick at random (with uniform 
density) a point j/ in a ball of radius S with x as center. We go to y if it is in K, 
otherwise, we stay at x. 

More generally, we may consider the integration (a "continuous" analog of 
counting) of a function over a convex set K. Of particular interest are logarithmically- 
concave (a positive real valued function F is log-concave over a domain if log F is 
concave over the domain) functions, since many families of familiar probability 
density functions like the multi-variate normal are log- concave. One may use the 
Metropolis version of the random walks for convex set (cf section^). Rapid mixing 
has been proved for this general case too 0]. 

Our third set of examples concerns the Ising model and other Statistical Me- 
chanics problems, (see 21 and references there). The computational problem 
arising from the Ising model is the following : we are given a real symmetric n x n 
matrix V (the entries of V arise as pairwise interaction energies), a real number 
B (the external field) and a positive real number /3 (the temperature). The Ising 
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partition function is defined as 

Z = Z{V,j,B,P)= wlierei?(<7) = -^l/ya,f7j -S^fjfe. 

cre{-l, + l}" i,3 k 

Jerrum and Sinclair |23 presented a polynomial time approximation algorithm to 
compute Z in the case when all Vij are non- negative (called the ferromagnetic case). 

Their algorithm for the ferromagnetic case first reduces the problem to the 
corresponding sampling problem and then more interestingly reduces this sampling 
problem to another one where we are given a graph (explicitly) G{y, E) with positive 
edge weights w{e). The problem is to pick a subset of edges of G at random such 
that the probability of picking a particular subset T is proportional to 

w{T) = ^|odd(T)| -Q ^(g)^ 

where ^ is a given positive number and odd(T) denotes the set of odd degree 
vertices in T. [Note that in this case X is the set of all subsets of edges and we have 
probabilities 7^ on X as given above, where X, V are implicitly defined by giving 



2. Preliminaries, eigenvalue connection 

Most of what we say extends naturally to continuous state space chains (where 
the set of states is (possibly uncountably) infinite) under mild conditions of measur- 
ability, but for ease of notation, here we state it for chains with a finite number of 
states. If P is the transition probability matrix with P^y denoting the probability 
of transition from state x to state y, for any natural number t, the matrix power 
P* denotes the t— step transition probabilities. All our chains will be connected 
and aperiodic and thus have steady state probabilities - TT{y) — hmt_>oo Px,y i'^iv) 
exists and is independent of the start state x). [The notation 7r(-) will be used 
throughout for steady state probabilities.] We let the vector p^*) = p(°)p* denote 
the probabilities at time t where we start with the initial distribution . All our 
chains will be "time-reversible", i.e., 7T{x)Pxy = 7r{y)Pyx will be valid for all pairs 
x,y. 

From Linear algebra, we get that the eigenvalues of P are 1 = Ai > A2 > 
A3 . . . Aat > —1 (where N is the number of states). Standard techniques yield : 

Theorem 1. For a finite time-reversible Markov Chain, with ttq = minj;7r(x), for 
any t, 

V |p(*Hx) - nix) < — [max(|A2|, \Xn\)]' ■ 

Modifying a Markov Chain by making it stay at the current state with proba- 
bility 1/2 and move according to its transition function with probability 1/2 ensures 
that Xn > while only increasing the (expected) running time by a factor of 2; 
so in the maximum above, we need only consider A2. We call a chain "lazy" if 
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Pxx > 2^x. We will use the phrase mixing time to denote the least positive real 
T such that for any p''^\ \p^'^\x) ~ < 1/4. It is known 1. that then for 

t > rlog(l/e), we have \p^*\x) — tt{x)\ < e. 

If we have a time-reversible Markov Chain on a finite set of states with tran- 
sition probability matrix P with steady state probabilities 7t(x) and is a positive 
real valued function on the states, there is a simple modification of the chain with 
steady state probabilities - '^{x)F{x)/ J^y ^iv)' called the the Metropolis mod- 
ification. It has transition probabilities - P^y = Pa;j,Min(l, jr^) for x ^ y. This 
construction is used in many instances including as mentioned in the introduction 
for sampling according to log-concave functions. 



3. Techniques for proving rapid mixing 

3.1. Conductance 

Alon and Milman [3] and Sinclair and Jerrum [SJ related A2 to a combinatorial 
quantity called "conductance" (in what may be looked on as a discrete analog of 
Cheeger's inequality for manifolds). This has turned out to be of great use in 
practice; often, first proofs of polynomial time convergence use conductance. 

For any two subsets S, T of states, the ergodic flow from S* to T (denoted 
Q{S,T)) is defined as Q{S,T) — J2xes yeT^i^)^xy The conductance <i> is defined 
by: 

$(5) = ^%^ $= min $(5). 

Tr{S) S:0<7r(S)<3/4 

$(5) is the probability of escaping from 5 to S* conditioned on starting in S in the 
steady state; since p*^"^ may be this distribution, it is intuitively clear that if the 
conductance of any set is low, then the mixing time is high. More interestingly, 
and show also a converse. 

Theorem 2. For a time-reversible, lazy, ergodic Markov chain with conductance 
$, we have 

1 9 

1 - 2$ < A2 < 1 - 

While conductance has helped bound the mixing time for some complicated 
chains (including the three examples mentioned in the introduction), it is not a 
fine enough tool to give the correct bounds for some simple chains. For example, 
consider the lazy version of the random walk on the 2" vertices on the n— cube, 
where in each step, one picks at random one of the n neighbours of the current vertex 
to go to. The mixing time is known to be O(nlogn). Conductance is Q{l/n) for 
this example, yielding only a mixing time of 0{n^) by Theorems and 

A striking contrast is the random walk on the vertices of the cube truncated 
by a half-space (i.e., the set of 0-1 vectors satisfying a given linear inequality.) 
Morris and Sinclair ^\ showed that the conductance of this walk is at least l/p{n) 
for a polynomial p{-). 
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We now discuss a recent improvement of conductance for chains with a finite 
number of states from similar results hold for chains with infinite number 

of states. In addition to measuring the ergodic flow from S to 5, we now also see 
if the flow is "well-spread out" in the sense that we "block" a set B C S, and then 
see if Q{S, S\B) is still high. We now define for S with < tt{S) < 3/4, 

T.o^ • aQiS,S\B) 

ns) - sup„,(„,.(,„ ^c5"Ti5)<. nisr ■ 

It is easy to show that a set B with tt{B) < ^Q{S,S), blocks at most 1/2 of the 
flow from S to S, so we have ^'(S') > j^{S)^. Thus, an assertion that mixing 
time is O(log(l/7ro) mins ^'(S')) would be at least as strong a result as we get from 
Theorems ^ and (Q. We prove a theorem which implies this assertion; indeed, 
instead of taking ming ^'(S'), the theorem takes an "average" of this quantity over 
different set sizes. We say that ip : [0,3/4] [0,1] is a "blocking conductance 
function" (b.c.f.) if (the second condition is technical) 

VS*, < n{S) < 3/4, > ^('^(S')) and < 2V(i') VO<t<t' < ^t. 



Theorem 3. If is a blocking conductance function of a lazy, ergodic, time- 
reversible, finite Markov chain, with ttq — mina;7r(a;), then, the mixing time is at 
most 

.3/4 ^ 

500 / —rrdt. 

This has been used to improve the analysis of the ball walk for convex sets 
in 1201 and also some other examples in [30]. Also, uses Theorem Q to argue 
that the mixing time of the grid lattice, (in a fixed number of dimensions) where 
some edges have failed according to a standard percolation model is still within a 
constant of the mixing time of the whole. 

3.2. Coupling 

Another important technique for proving rapid mixing is "Coupling" [J. A cou- 
pling is a stochastic process {Xt, It ), < = 0, 1, 2, . . ., where each of {Xt, i = 0, 1, . . .} 
and {Yt, t = 0, 1, 2, . . .} is marginally a copy of the chain. [They may be mutually 
dependent.] So, we run "two copies" of the chain {Xt,Yt) in tandem. If Yq is dis- 
tributed according to tt, the steady state distribution, then, the distribution p*^*^ of 
Xt, satisfies 

Y,\P^'Hx)-7r{x)\ < Pv{Xt^Yt). 

X 

To apply this, one must construct a coupling {Xt, Yt) for which Xt and Yt "meet" as 
fast as possible. This can prove difficult. Path coupling introduced by Bubley and 
Dyer [5j which we describe now simplifies the task quite a bit. In path coupling, 
we have an underlying connected directed graph G on the set of states. (G could 
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just be the graph of the Markov Chain.) G defines distances between pairs of states 
- namely the length of the shortest path in G. We only need to define a coupling 
of adjacent pairs of vertices, with the property that for every pair of adjacent (in 
G) vertices the expected distance between the next states of u, u is at most 

/3 < 1. They then show that 

Theorem 4. If D is the diameter (ofG), then for any t>0, Pr{Xt ^ Yt) < 0(3*-. 

Propp and Wilson 31, have designed a method they call Coupling from 
the Past. This applies to chains with a partial order on the set of states with 
a least state and a greatest state 1. They show that running two copies of the 
Chain backwards - one from and one from 1 - with a coupling satisfying a certain 
monotonicity condition until they "meet" gives us a good upper bound on the 
number of steps needed to mix. We refer the reader to [31] for details. 

3.3. Other methods 

One way to prove a lower bound on conductance for a chain with a finite set 
of states X is to construct a family of |Xp paths - one from each state to each 
other using as edges the transitions of the Markov Chain, so that no transition is 
"overloaded" by too many paths. We do not supply here any more details of this 
technique referred to as the method of "canonical paths" and used by Jerrum and 
Sinclair 

We may look upon the construction of these paths as routing a multi-commodity 
flow through the network and apply techniques from Network Flows. [33] pursues 
this. jl3| uses different measures of congestion to achieve improved results in some 
cases and their methods are applied in |15j . 

Another important method is the use of logarithmic Sobolev inequalities, 

where, we use (relative) entropy - Ent(p'^*)) = p*^*) [x) log ^^^^^ as the mea- 
sure of distance. It is known that for ergodic Markov Chains, this distance declines 
exponentially ^2]; i-e., there is a constant a E (0, 1) such that 

Ent(p(*)) < a*Ent(p("'). 

Note that Ent(p(°)) < log(l/7ro). So, it suflices to choose t = (loglog -I- log(l/e)) 
/(I — a) to reduce the entropy to e; the dependence on I/ttq is thus better. But we 
need to determine a which is only known for simple chains. It is known that a > A2, 
so the most that this method could save over using something like Theorem is the 
log(l/7ro) factor. and |i2I}j contain several comparisons between the log-Sobolev 
inequalities, eigenvalue bounds and conductance. jJHl uses the log-Sobolev inequal- 
ity to prove better bounds on the Metropolis version of the coordinate random walk 
for log-concave functions. 

For the random walk on the cube a simple coupling argument, which, moves 
both Xt and Yt in the same coordinate, trying to make them equal - shows that 
mixing time is O(nlogn). Some sophisticated Fourier Transform methods have 
been used to get much more exact results here and the results are applicable in 
other contexts too. 
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A traditional approach to sampling from a probability distribution involves 
the so-called "Stopping Rules" P, where one specifies a rule for when to stop the 
Markov Chain and shows that if we follow the rule, we sample (exactly) from the 
desired distribution. [2] contain results about the expected time needed for certain 
stopping rules, which then serves as an upper bound on the number of steps needed 
to converge. 

We also mention two general techniques for deriving convergence rates of a 
Markov Chain from the knowledge of convergence rates for a simpler-to-analyze 
chain. The first one is called Comparison and is developed in ^11]. The second 
technique is called Decomposition [22; here one decomposes the chain into chains 
on subsets of states and derives a bound on the convergence rate of the whole chain 
based on the rates for the "sub-chains" and the interconnections between them. 

4. Solution of sampling and counting problems 

PERMANENT 

We consider the permanent of a n x n 0-1 matrix A. We may define a bipartite 
graph corresponding to the matrix. Each a e 5„ with ^i^o-(i) = 1 for all i corre- 
sponds to a perfect matching in the graph. Let Ai be the set of perfect matchings 
in the graph. Unfortunately, no rapidly mixing Markov Chain with only M as the 
set of states is known. Broder [7j first defined the following Markov Chain. We 
also include the set of "near- perfect" matchings - Ai' {a, near-perfect matching has 
^ — 1 edges, no two incident to the same vertex). Transitions of the Markov Chain 
are as follows: In any current state, M, we pick an edge e — {u, v) of the graph 
uniformly at random (all edges are equally likely) and 

• if Af e Mn and e e M, move to M' = M - e. 

• If Af e A4n-i and u and v are both unmatched in Af, then move to M' = 
M + e. 

• Me A4n-i, u is matched to w in A/ and v unmatched, then move to M' = 
(Af + e) — {u, w); make a symmetric move if v is matched and u unmatched. 

• In all other cases, stay at M. 

1201 showed that if A is dense (each row has at least n/2 I's), then the chain above 
mixes rapidly and in addition that < p{n)\A4\ for a polynomial P{-)- Thus, 

rejection sampling - accept result of a run of the chain if the result is in A4 yields 
a polynomial time sampling procedure. 

Jerrum, Sinclair and Vigoda |22j develop an algorithm for the general 0-1 
permanent (including the non-dense case) . Here is very brief sketch of their algo- 
rithm : An edge-weighting w assigns a (positive) real weight w{e) to each edge. 
For a matching M w{M) = OesAf ^i^) weight. For a set S of matchings, 

^('5') — J^Mes '^i-^'^)- Finally, for each pair of vertices {u,v), define w'{u,v) to 
be the ratio of the weight of all perfect matchings to the weight of all near-perfect 
matchings which leave u,v unmatched. Then define the "modified weight" w'{M) 
of a matching A/ to be w{M) if Af is perfect and w{M)w' {u,v) if M leaves u,v 
unmatched. They first show that a Metropolis version of the above random walk 
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to sample according to w'{M) mixes rapidly. But the w' are not known; they ar- 
gue that if we start with the complete graph and go through a sequence of graphs, 
where in each step, we lower the edge weight of a non-edge of G by a factor, then 
we can successively estimate w' for each edge- weighting (of the complete graph) in 
the sequence. The final element of the sequence has low enough weights for the 
non-edges that it gives a good approximation to the permanent. 
THE ISING MODEL 

Recall the subgraph sampling problem in section ^ Here is the random walk 
they use. The states of the Markov Chain are the subsets of E. Their chain is 
the Metropolis version of the following simple Markov Chain whose steady state 
probabilities are uniform over all subsets of the edges, namely : at any current 
subset T of E, pick uniformly at random an edge e £ i?; if e G T, then go to 
T' = T — e, otherwise go to T' = T + e. They also make the chain lazy. The proof 
of a lower bound on conductance relies on a canonical paths argument. 

The algorithm that is preferred by physicists is the one due to Swendsen and 
Wang . This algorithm switches the signs on large blocks of vertices of the graph 
at once. But while this seems to work well in practice, no proof of rapid mixing is 
known. 

CONVEX SETS, LOG-CONCAVE FUNCTIONS 

Consider the ball walk in a convex set K in R" with balls of radius S. We 
use the notation P^y for the transition probability density from x to y here. The 
conductance of a (measurable) subset S* of if is now defined as 



Let dS be the boundary of 5* interior to K. Since points x E S on or near dS, 
intuitively have a high J Pxy, a lower bound on Voln_i(9iS') would seem to imply a 
lower bound on conductance. This is indeed the case. Lower bounds on Vol„„i((9S') 
have been the subject of much effort. The most general result known is the following. 

Theorem 5. Isoperimetry Suppose K is a compact convex set in R" of diameter 
d and F is a positive real-valued log-concave Junction on K . Then for any measur- 
able S C K with JgF < (1/2) Ji^F, and measurable boundary dS interior to K, 



The theorem was first proved for the case _F = 1 by Lovasz and Simonovits |27 | 
and independently also by Khachiyan and Karzanov 251 . The result was generalized 
to the case of general log-concave measures F by Applegate and Kannan j4j using 
the same techniques. We may add an extra factor of ln(/^ F/ F) to the right 
hand side; this was proved independently in 26 and also by Bobkov The most 
recent algorithm for computing the volume of convex sets is in [24] , where references 
to earlier papers may be found. 



/xgS IyeK\S ^i^)Pxy 

min(7r(5), 1 — 7r(5)) 



we have 
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There are many other counting problems on which progress has been made 
using this method. Again, we are not able to present a comprehensive review here. 

A notable result is the one for the truncated cube already mentioned in section 
151 Another example of interest is Contingency Tables - where we are given m, n 
(positive integers) and the row and column sums of an m x n matrix A. The problem 
is to sample uniformly at random from the set of to x 71 matrices with non-negative 
integer entries with these row and column sums. The problem remains open, but 
there are several partial results '14|. 

There are many tiling problems, where the problem is to pick a random tiling 
of say a large square in the plane by dominoes of a given shape. These problems 
arise in Statistical Mechanics. For regular shapes, it is often possible to devise a 
polynomial time algorithm to count the number exactly. But it is important to 
devise algorithms with low polynomial time bounds. There has been much progress 
here - see (281 and references there. Random generation of colorings and independent 
sets of a graph has received much attention lately due to connections to Statistical 
Mechanics jHj. 

Acknowledgment. I thank Ravi Montenegro for suggesting some changes in the 
manuscript. 
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